Improving Classification Accuracy Using Missing Data Filling Algorithms for the Criminal Dataset

نویسندگان

Cuicui Sun

Chunlong Yao

Lan Shen

Xiaoqiang Yu

چکیده

Predicting crime types by using classification algorithms can help to find factors affecting crimes and prevent crimes. Due to various reasons in the process of data collection, there are often a large number of missing values in actual criminal dataset, which seriously affects the classification accuracy. Therefore, based on mutual KNNI (K nearest neighbor imputation) algorithm and combined with GRA (Grey Relational Analysis) theory, a novel data filling algorithm called GMKNN is proposed in order to improve the classification accuracy. The algorithm replaces the Euclidean distance formula used in mutual KNNI algorithm with the Grey relational grade formula to eliminate the effect of noise from the nearest neighbors and effectively deal with the discrete attributes. By comparing with several popular data filling algorithms based on a real criminal dataset with lots of missing values, higher classification accuracy can be obtained by using GMKNN algorithm, which is up to 77.837%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Crime Types Using Classification Algorithms

Criminal behaviors can reflect the characteristics of the criminals to a great extent. To predict the crime types according to characteristics of vast amounts of criminals is an important part of criminal behavior analysis. In order to get high classification accuracy, three typical classification algorithms, including C4.5 algorithm, Naive Bayesian algorithm and K nearest neighbor (KNN) algori...

متن کامل

ADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION

With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumo...

متن کامل

Improving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering

Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...

متن کامل

Diagnosis of Diabetes Using an Intelligent Approach Based on Bi-Level Dimensionality Reduction and Classification Algorithms

Objective: Diabetes is one of the most common metabolic diseases. Earlier diagnosis of diabetes and treatment of hyperglycemia and related metabolic abnormalities is of vital importance. Diagnosis of diabetes via proper interpretation of the diabetes data is an important classification problem. Classification systems help the clinicians to predict the risk factors that cause the diabetes or pre...

متن کامل

Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features

Heart is one of the most important members of the body, and heart disease is the major cause of death in the world and Iran. This is why the early/on time diagnosis is one of the significant basics for preventing and reducing deaths of this disease. So far, many studies have been done on heart disease with the aim of prediction, diagnosis, and treatment. However, most of them have been mostly f...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Improving Classification Accuracy Using Missing Data Filling Algorithms for the Criminal Dataset

نویسندگان

چکیده

منابع مشابه

Detecting Crime Types Using Classification Algorithms

ADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION

Improving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering

Diagnosis of Diabetes Using an Intelligent Approach Based on Bi-Level Dimensionality Reduction and Classification Algorithms

Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features

عنوان ژورنال:

اشتراک گذاری